智能论文笔记

SMAuC -- The Scientific Multi-Authorship Corpus

Philipp Sauer , Janek Bevendorff , Lukas Gienapp , Wolfgang Kircheis , Erik Körner , Benno Stein , Martin Potthast

分类：自然语言处理

2022-11-04

With an ever-growing number of new publications each day, scientific writing poses an interesting domain for authorship analysis of both single-author and multi-author documents. Unfortunately, most existing corpora lack either material from the science domain or the required metadata. Hence, we present SMAuC, a new metadata-rich corpus designed specifically for authorship analysis in scientific writing. With more than three million publications from various scientific disciplines, SMAuC is the largest openly available corpus for authorship analysis to date. It combines a wide and diverse range of scientific texts from the humanities and natural sciences with rich and curated metadata, including unique and carefully disambiguated author IDs. We hope SMAuC will contribute significantly to advancing the field of authorship analysis in the science domain.

translated by 谷歌翻译

An operational framework to automatically evaluate the quality of weather observations from third-party stations

Quanxi Shao , Ming Li , Joel Janek Dabrowski , Shuvo Bakar , Ashfaqur Rahman , Andrea Powell , Brent Henderson

分类： (统计)机器学习

2022-12-05

With increasing number of crowdsourced private automatic weather stations (called TPAWS) established to fill the gap of official network and obtain local weather information for various purposes, the data quality is a major concern in promoting their usage. Proper quality control and assessment are necessary to reach mutual agreement on the TPAWS observations. To derive near real-time assessment for operational system, we propose a simple, scalable and interpretable framework based on AI/Stats/ML models. The framework constructs separate models for individual data from official sources and then provides the final assessment by fusing the individual models. The performance of our proposed framework is evaluated by synthetic data and demonstrated by applying it to a re-al TPAWS network.

translated by 谷歌翻译

Bayesian Physics Informed Neural Networks for Data Assimilation and Spatio-Temporal Modelling of Wildfires

Joel Janek Dabrowski , Daniel Edward Pagendam , James Hilton , Conrad Sanderson , Daniel MacKinlay , Carolyn Huston , Andrew Bolt , Petra Kuhnert

分类：机器学习

2022-12-02

We apply Physics Informed Neural Networks (PINNs) to the problem of wildfire fire-front modelling. The PINN is an approach that integrates a differential equation into the optimisation loss function of a neural network to guide the neural network to learn the physics of a problem. We apply the PINN to the level-set equation, which is a Hamilton-Jacobi partial differential equation that models a fire-front with the zero-level set. This results in a PINN that simulates a fire-front as it propagates through a spatio-temporal domain. We demonstrate the agility of the PINN to learn physical properties of a fire under extreme changes in external conditions (such as wind) and show that this approach encourages continuity of the PINN's solution across time. Furthermore, we demonstrate how data assimilation and uncertainty quantification can be incorporated into the PINN in the wildfire context. This is significant contribution to wildfire modelling as the level-set method -- which is a standard solver to the level-set equation -- does not naturally provide this capability.

translated by 谷歌翻译

Bayesian Neural Network Inference via Implicit Models and the Posterior Predictive Distribution

Joel Janek Dabrowski , Daniel Edward Pagendam

分类： (统计)机器学习 | 机器学习

2022-09-06

我们提出了一种新的方法，可以在复杂模型（例如贝叶斯神经网络）中执行近似贝叶斯推断。该方法比马尔可夫链蒙特卡洛更可扩展到大数据，它具有比变异推断更具表现力的模型，并且不依赖于对抗训练（或密度比估计）。我们采用了构建两个模型的最新方法：（1）一个主要模型，负责执行回归或分类；（2）一个辅助，表达的（例如隐式）模型，该模型定义了主模型参数上的近似后验分布。但是，我们根据后验预测分布的蒙特卡洛估计值通过梯度下降来优化后验模型的参数 - 这是我们唯一的近似值（除后模型除外）。只需要指定一个可能性，可以采用各种形式，例如损失功能和合成可能性，从而提供无可能的方法的形式。此外，我们制定了该方法，使后样品可以独立于或有条件地取决于主要模型的输入。后一种方法被证明能够增加主要模型的明显复杂性。我们认为这在诸如替代和基于物理的模型之类的应用中很有用。为了促进贝叶斯范式如何提供不仅仅是不确定性量化的方式，我们证明了：不确定性量化，多模式以及具有最新预测的神经网络体系结构的应用。

translated by 谷歌翻译

Tackling Neural Architecture Search With Quality Diversity Optimization

Lennart Schneider , Florian Pfisterer , Paul Kent , Juergen Branke , Bernd Bischl , Janek Thomas

分类：机器学习 | 神经与进化计算 | (统计)机器学习

2022-07-30

神经建筑搜索（NAS）已被广泛研究，并已成长为具有重大影响的研究领域。虽然经典的单目标NAS搜索具有最佳性能的体系结构，但多目标NAS考虑了应同时优化的多个目标，例如，将沿验证错误最小化资源使用率。尽管在多目标NAS领域已经取得了长足的进步，但我们认为实际关注的实际优化问题与多目标NAS试图解决的优化问题之间存在一些差异。我们通过将多目标NAS问题作为质量多样性优化（QDO）问题来解决这一差异，并引入了三种质量多样性NAS优化器（其中两个属于多重速度优化器组），以寻求高度多样化但多样化的体系结构对于特定于应用程序特定的利基，例如硬件约束。通过将这些优化器与它们的多目标对应物进行比较，我们证明了质量多样性总体上优于多目标NA在解决方案和效率方面。我们进一步展示了应用程序和未来的NAS研究如何在QDO上蓬勃发展。

translated by 谷歌翻译

AMLB: an AutoML Benchmark

Pieter Gijsbers , Marcos L. P. Bueno , Stefan Coors , Erin LeDell , Sébastien Poirier , Janek Thomas , Bernd Bischl , Joaquin Vanschoren

分类：机器学习 | (统计)机器学习

2022-07-25

比较不同的汽车框架是具有挑战性的，并且经常做错了。我们引入了一个开放且可扩展的基准测试，该基准遵循最佳实践，并在比较自动框架时避免常见错误。我们对71个分类和33项回归任务进行了9个著名的自动框架进行了详尽的比较。通过多面分析，评估模型的准确性，与推理时间的权衡以及框架失败，探索了自动框架之间的差异。我们还使用Bradley-terry树来发现相对自动框架排名不同的任务子集。基准配备了一个开源工具，该工具与许多自动框架集成并自动化经验评估过程端到端：从框架安装和资源分配到深入评估。基准测试使用公共数据集，可以轻松地使用其他Automl框架和任务扩展，并且具有最新结果的网站。

translated by 谷歌翻译

A Spatio-Temporal Neural Network Forecasting Approach for Emulation of Firefront Models

Andrew Bolt , Carolyn Huston , Petra Kuhnert , Joel Janek Dabrowski , James Hilton , Conrad Sanderson

分类：机器学习

2022-06-17

野火传播的计算模拟通常在各种条件下（例如地形，燃料类型，天气）采用经验分布计算。条件下的小扰动通常会导致火灾传播（例如速度和方向）的显着变化，因此需要进行计算昂贵的大型模拟以量化不确定性。模型仿真寻求使用机器学习的物理模型的替代表示，旨在提供更有效和/或简化的替代模型。我们提出了一个专用时空神经网络，用于模型仿真，能够捕获火灾传播模型的复杂行为。所提出的方法可以在基于神经网络的方法通常具有挑战性的空间和时间分辨率上进行近似预测。此外，由于新的数据增强方法，即使使用小型训练集，提出的方法也是可靠的。经验实验表明，模拟和模拟的火山之间的良好一致性，平均Jaccard得分为0.76。

translated by 谷歌翻译

Multi-Objective Hyperparameter Optimization -- An Overview

Florian Karl , Tobias Pielok , Julia Moosbauer , Florian Pfisterer , Stefan Coors , Martin Binder , Lennart Schneider , Janek Thomas , Jakob Richter , Michel Lang

分类：机器学习 | (统计)机器学习

2022-06-15

超参数优化构成了典型的现代机器学习工作流程的很大一部分。这是由于这样一个事实，即机器学习方法和相应的预处理步骤通常只有在正确调整超参数时就会产生最佳性能。但是在许多应用中，我们不仅有兴趣仅仅为了预测精度而优化ML管道；确定最佳配置时，必须考虑其他指标或约束，从而导致多目标优化问题。由于缺乏知识和用于多目标超参数优化的知识和容易获得的软件实现，因此通常在实践中被忽略。在这项工作中，我们向读者介绍了多个客观超参数优化的基础知识，并激励其在应用ML中的实用性。此外，我们从进化算法和贝叶斯优化的领域提供了现有优化策略的广泛调查。我们说明了MOO在几个特定ML应用中的实用性，考虑了诸如操作条件，预测时间，稀疏，公平，可解释性和鲁棒性之类的目标。

translated by 谷歌翻译

Architectural patterns for handling runtime uncertainty of data-driven models in safety-critical perception

Janek Groß , Rasmus Adler , Michael Kläs , Jan Reich , Lisa Jöckel , Roman Gansch

分类：人工智能 | 机器学习

2022-06-14

基于机器学习和其他AI技术的数据驱动模型（DDM）在越来越多的自主系统的感知中起着重要作用。由于仅基于用于培训的数据而仅对其行为进行隐式定义，因此DDM输出可能会出现不确定性。这对通过DDMS实现安全 - 关键感知任务的挑战提出了挑战。解决这一挑战的一种有希望的方法是估计操作过程中当前情况的不确定性，并相应地调整系统行为。在先前的工作中，我们专注于对不确定性的运行时估计，并讨论了处理不确定性估计的方法。在本文中，我们提出了处理不确定性的其他架构模式。此外，我们在定性和定量上对安全性和性能提高进行了定量评估。对于定量评估，我们考虑了一个用于车辆排的距离控制器，其中通过考虑在不同的操作情况下可以降低距离的距离来衡量性能增长。我们得出的结论是，考虑驾驶状况的上下文信息的考虑使得有可能或多或少地接受不确定性，具体取决于情况的固有风险，从而导致绩效提高。

translated by 谷歌翻译

A Collection of Quality Diversity Optimization Problems Derived from Hyperparameter Optimization of Machine Learning Models

Lennart Schneider , Florian Pfisterer , Janek Thomas , Bernd Bischl

分类：机器学习

2022-04-28

质量多样性优化的目的是为当前的问题生成各种各样但高性能的解决方案。例如，典型的基准问题是找到机器人臂配置的曲目或游戏策略的集合。在本文中，我们提出了一系列质量多样性优化问题，以解决机器学习模型的超参数优化 - 迄今为止迄今未经推广的质量多样性优化的应用。我们的基准问题涉及新颖的功能，例如解释性或模型的资源使用。为了允许快速有效的基准测试，我们在Yahpo Gym上建立了Yahpo Gym，这是一个最近提议的开源基准测试套件，用于超参数优化，可利用高性能的替代模型，并返回这些替代模型预测，而不是评估真正昂贵的黑匣子功能。我们提出了一项初步实验研究的结果，该研究将不同质量多样性优化剂在基准问题上进行比较。此外，我们讨论了在超参数优化的背景下，质量多样性优化的未来方向和挑战。

translated by 谷歌翻译